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51 Abstract 

52 We previously showed that close relatives of human coronavirus (HCoV)-229E exist in 

53 African bats. The small sample and limited genomic characterizations prevented further 

54 analyses so far. Here, we tested 2,087 fecal specimens from 11 bat species sampled in Ghana 

55 for HCoV-229E-related viruses by RT-PCR. Only hipposiderid bats tested positive. To 

56 compare the genetic diversity of bat viruses and HCoV-229E, we tested historical isolates and 

57 diagnostic specimens sampled globally over 10 years. Bat viruses were five- to sixfold more 

58 diversified than HCoV-229E in RNA-dependent RNA polymerase (RdRp) and Spike genes. In 

59 phylogenetic analyses, HCoV-229E strains were monophyletic and not intermixed with 

60 animal viruses. Bat viruses formed three large clades in close and more distant sister 

61 relationship. A recently described 229E-related alpaca virus occupied an intermediate 

62 phylogenetic position between bat and human viruses. According to taxonomic criteria, 

63 human, alpaca and bat viruses form a single CoV species showing evidence for multiple 

64 recombination events. HCoV-229E and the alpaca virus showed a major deletion in the Spike 

65 SI region compared to all bat viruses. Analyses of four full genomes from 229E-related bat 

66 CoVs revealed an eighth open reading frame (ORF8) located at the genomic 3’-end. ORF8 

67 also existed in the 229E-related alpaca virus. Re-analysis of HCoV-229E sequences showed a 

68 conserved transcription regulatory sequence preceding remnants of this ORE, suggesting its 

69 loss after acquisition of a 229E-related CoV by humans. These data suggested an evolutionary 

70 origin of 229E-related CoVs in hipposiderid bats, hypothetically with camelids as 

71 intermediate hosts preceding the establishment of HCoV-229E. 

72 

73 Importance 

74 The ancestral origins of major human coronaviruses (HCoV) likely involve bat hosts. Here, 

75 we provide conclusive genetic evidence for an evolutionary origin of the common cold virus 

76 HCoV-229E in hipposiderid bats by analyzing a large sample of African bats and 
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77 characterizing several bat viruses on a full genome level. Our evolutionary analyses show that 

78 animal and human viruses are genetically closely related, can exchange genetic material and 

79 form a single viral species. We show that the putative host switches leading to the formation 

80 of HCoV-229E were accompanied by major genomic changes including deletions in the viral 

81 spike glycoprotein gene and loss of an open reading frame. We re-analyze a previously 

82 described genetically related alpaca virus and discuss the role of camelids as potential 

83 intermediate hosts between bat and human viruses. The evolutionary history of HCoV-229E 

84 likely shares important characteristics with that of the recently emerged highly pathogenic 

85 MERS-Coronavirus. 
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87 Introduction 

88 Coronaviruses (CoV) are enveloped viruses with a single-stranded, positive-sense contiguous 

89 RNA genome of up to 32 kilobases. The subfamily Coronavirinae contains four genera 

90 termed Alpha-, Beta-, Gamma- and Deltacoronavirus. Mammals are predominantly infected 

91 by alpha- and betacoronaviruses, while gamma- and deltacoronaviruses mainly infect avian 

92 hosts (1,2). 

93 

94 Four human coronaviruses (HCoVs) termed FiCoV-229E, -NL63, -OC43 and -FiKUl 

95 circulate in the human population and mostly cause mild respiratory disease (3). FiCoV-229E 

96 is frequently detected in up to 15% of specimens taken from individuals with respiratory 

97 disease (4-6). Although FiCoV-229E can be detected in fecal specimens, HCoVs generally 

98 don’t seem to play a role in acute gastroenteritis (7-9). Severe respiratory disease with high 

99 case-fatality rates is caused by severe acute respiratory syndrome (SARS)-CoV and Middle 

100 East respiratory syndrome (MERS)-CoV which emerged recently. HCoV-229E and HCoV- 

101 NL63 belong to the genus Alphacoronavirus, while HCoV-OC43, HCoV-HKUl, SARS- and 

102 MERS-CoV belong to the genus Betacoronavirus (1, 10). 

103 

104 In analogy to major human pathogens including Ebola virus, rabies virus, mumps virus and 

105 hepatitis B and C viruses (11-16), the evolutionary origins of SARS- and MERS-CoV were 

106 traced back to bats (17-22). The genetic diversity of bat CoVs described over the last decade 

107 exceeds the diversity in other mammalian hosts (2). This has led to speculations on an 

108 evolutionary origin of all mammalian CoVs in bat hosts (23). Bats share important ecological 

109 features potentially facilitating virus maintenance and transmission, such as close contact 

110 within large social groups, longevity, and the ability of flight (13, 24). 

111 
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112 How humans become exposed to remote wildlife viruses is not always clear (25). Human 

113 infection with SARS-CoV and MERS-CoV was likely mediated by peri-domestic animals. 

114 For SARS-CoV, the suspected source of infection were carnivores (26). Preliminary evidence 

115 suggested that these carnivore hosts may also have adapted SARS-CoV for human infection 

116 (27). For MERS-CoV, camelids are likely intermediate hosts, supported by circulation of 

117 MERS-CoV in camel herds globally and for prolonged periods of time (28-30). Whether 

118 MERS-CoV only recently acquired the capacity to infect humans in camelids is unclear. 

119 The evolutionary origins of HCoV-229E are uncertain. In 2007, a syndrome of severe 

120 respiratory disease and sudden death was recognized in captive alpacas from the U.S. (31) and 

121 an alphacoronavirus genetically closely related to HCoV-229E was identified as the causative 

122 agent (32). 

123 In 2009, we detected viruses in fecal specimens from 5 of 75 hipposiderid bats from Ghana 

124 and showed that these bat viruses were genetically related to HCoV-229E by characterizing 

125 their partial RNA-dependent RNA polymerase (RdRp) and Nucleocapsid genes (33). Lack of 

126 specimens containing high CoV RNA concentrations so far prevented a more comprehensive 

127 characterization of those bat viruses to further address their relatedness to HCoV-229E. Here, 

128 we tested more than 2,000 bats from Ghana for CoVs related to HCoV-229E. We describe 

129 highly diversified bat viruses on a full genome level and analyze the evolutionary history of 

130 HCoV-229E and the genetically related alpaca CoV. 

131 
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133 Materials and Methods 

134 Bat and human sampling 

135 Bats were caught in the Ashanti region, central Ghana, during 2009-2011 as described 

136 previously (21). Archived anonymized respiratory specimens derived from patients sampled 

137 between 2002-2011 were obtained from Hong Kong/China, Germany, The Netherlands, 

138 Brazil and Ghana. 

139 

140 RNA purification, coronavirus detection and characterization 

141 RNA was purified from approximately 20 mg of fecal material suspended in 500 pL 

142 RNAlater stabilizing solution using the MagNA Pure 96 system (Roche Penzberg, Germany). 

143 Elution volumes were 100 pL. Testing for CoV RNA was done using a real time RT-PCR 

144 assay designed to allow detection of HCoV-229E and all genetically related bat CoVs known 

145 from our pilot study (33). Oligonucleotide sequences were CoV229Elike-F13948m 

146 TCYAGAGAGGTKGTTGTTACWAAYCT, CoV229Elike-P 13990m FAM (6- 

147 Carboxyfuorescein)-TGGCMACTTAATAAGTTTGGIAARGCYGG-BHQl (Black Hole 

148 Quencher 1) and CoV229Elike-R14138m CGYTCYTTRCCAGAWATGGCRTA. Testing 

149 used the SSIII RT-PCR Kit (Life Technologies, Karlsruhe, Germany) with the following 

150 cycling protocol in a LightCycler 480 (Roche, Penzberg, Germany): 20 min. at 50 °C for 

151 reverse transcription, followed by 3 min. at 95 °C and 45 cycles of 15 sec. at 95 °C, 10 sec. at 

152 58 “C and 20 sec. at 72 °C. CoV quantification relied on cRNA in vitro transcripts generated 

153 from TA-cloned peri-amplicons using the T7-driven Megascript (Life technologies, 

154 Heidelberg, Germany) kit as described previously (34). Partial RdRp gene sequences from 

155 real time RT-PCR-positive specimens were obtained as described previously (18). Full CoV 

156 genomes and Spike gene sequences were generated for those specimens containing highest 

157 CoV RNA concentrations using sets of nested RT-PCR assays (primers available upon 

158 request) located along the HCoV-229E genome and designed to amplify small sequence 
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159 islets. Sequence islets were connected by bridging long-range nested PCR using strain- 

160 specific primers (available upon request) and the Expand High Fidelity kit (Roche) on cDNA 

161 templates generated with the Superscript III reverse transcriptase (Life Technologies). 

162 

163 Phylogenetic analyses 

164 Bayesian phylogenetic reconstructions were made using MrBayes V3.1 (35) under 

165 assumption of a GTR+G+I nucleotide substitution model for partial RdRp sequences and the 

166 WAG amino acid substitution model for translated open reading frames (ORFs). Two million 

167 generations were sampled every 100 steps, corresponding to 20,000 trees of which 25% were 

168 discarded as burn-in before annotation using TreeAnnotator VI.5 and visualization using 

169 FigTree VI.4 from the BEAST package (36). Neighbor-joining phylogenetic reconstructions 

170 were made using MEGA5.2 (37) and a percentage nucleotide distance model, the complete 

171 deletion option and 1,000 bootstrap replicates. Genome comparisons were made using 

172 MEGA5.2 (37); SSE Vl.l (38) and recombination analyses were made using SimPlot V3.5 

173 (39). 

174 

175 
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176 Results 

177 Specimens from 2,087 bats belonging to 11 species were available for PCR testing. Table 1 

178 provides details on the overall sample composition and detection rates in individual bat 

179 species. Only bats belonging to the family Hipposideridae tested positive in 81 of 1,853 

180 specimens (4.4%). All positive-testing bats had been morphologically identified in the field as 

181 either Hipposideros cf. ruber or H. abae. Those were the most abundant species within the 

182 sample. No HCoV-229E-related RNA was detected in the 17 available specimens from H. 

183 jonesi and H. cf. gigas. 

184 

185 An 816 nucleotide (nt) fragment from the RdRp gene was obtained from 41 of the 81 positive 

186 specimens (GenBank accession nos. KT253259-KT253299). This fragment was used for 

187 further analysis as the 816 nt sequence yields improved resolution in inference of phylogeny 

188 as compared to shorter sequences derived from RT-PCR screening of field-derived samples 

189 (2). To expand the available genomic data for HCoV-229E, the 816 nt RdRp fragment was 

190 also sequenced from 23 HCoV-229E strains from patients sampled between 2002-2011 in 

191 China, Germany, The Netherlands, Brazil, and Ghana. In addition, the 816 nt RdRp fragment 

192 was sequenced from two historical HCoV-229E strains isolated in 1965 and the 1980ies (40) 

193 (GenBank accession nos. KT253300-KT253323). In analogy to the official taxonomic 

194 designation SARS-related CoV including human SARS-CoV and related CoVs from other 

195 animals (1), we hereafter restrict usage of the term HCoV-229E to the human virus and refer 

196 to the animal viruses as 229E-related CoV. Figure 1A shows a Bayesian phylogeny of the 

197 partial RdRp gene. The bat virus diversity we observed in our pilot study (represented by 

198 viruses Buoyem344 and Kwamangl9) was expanded greatly. A phylogenetically basal virus 

199 termed Kwamang8 obtained within our pilot study was not detected again, although the 

200 present study contained specimens from the same cave and bat species. All human strains 

201 occupied an apical phylogenetic position and were not intermixed with any of the animal 

9 


o 

c 

D 

o 


cfWology JVI Journal of Virology Accepted Mdfl USCH pf Posted Oflllhe 


202 viruses. The recently described alpaca 229E-related CoV (32) clustered with two viruses 

203 obtained from hipposiderid bats in a parallel study from our groups in the Central African 

204 country Gabon (41). The two Gabonese bat-associated viruses differed from the alpaca 229E- 

205 related CoV by only 3.2% nucleotide content within the RdRp fragment. Hipposiderid bat 

206 CoVs were neither sorted by sampling sites, nor by their host species in their RdRp genes. 

207 Overall, bat 229E-related CoVs sampled over 3 years differed up to 13.5% in their nt and 

208 3.3% in their amino acid (aa) sequences. Although the HCoV-229E dataset used for 

209 comparison was sampled over 50 years, the human-associated viruses showed 5-10fold less 

210 genetic diversity than bat viruses with only 1.4% nt and 0.7% aa variation. Because of the 

211 small sequence variation in HCoV-229E, Figure 1A contains only nine representative HCoV- 

212 229E strains. The neighbor-joining phylogeny shown in Figure IB represents the high 

213 sequence identity between all HCoV-229E strains determined in this study. 

214 

215 To analyze to which extent bat 229E-related CoV show genetic variation, the Spike gene 

216 encoding the viral glycoprotein was characterized from 15 representative bat viruses (labeled 

217 with a triangle in Figure 1A). Figure 1C shows a Bayesian phylogenetic tree of the bat 229E- 

218 related CoV Spike gene sequences and HCoV-229E full Spike sequences sampled over 50 

219 years. The bat viruses formed three genetically diverse lineage, of which two phylogenetically 

220 basal lineages contained bat viruses only. These lineages were sorted according to their 

221 sampling sites Kwamang (abbreviated KW) and Akpafu Todzi (abbreviated AT). A third 

222 lineage contained closely related bat viruses obtained from three different sample sites 

223 separated by several hundred kilometers (Buoyem, Kwamang and Forikrom) (21). These data 

224 suggested co-circulation of different Spike gene lineages within sampling sites as well as the 

225 existence of separate lineages between sites. However, the small number of viruses 

226 characterized from the phylogenetically basal bat clades 1 and 2 implies that caution should 

227 be taken in assertions on geographically separated Spike gene lineages. The alpaca 229E- 
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228 related CoV and all HCoV-229E strains clustered in apical phylogenetic position compared to 

229 the bat viruses. The most closely related bat viruses from lineage 1 differed from HCoV-229E 

230 by 8.4-13.7%. The two other bat virus lineages were less related to HCoV-229E with 30.6- 

231 33.0% aa sequence distance. 

232 

233 Topologies of the Bayesian phylogenetic reconstructions of RdRp and Spike genes from bats 

234 and the alpaca were not congruent, compatible with past recombination events across animal 

235 229E-related CoVs. The high similarity of the RdRp gene of human EiCoV-229E strains did 

236 not allow comparisons of the RdRp- based with the Spike-based topology. To further 

237 investigate the genomic relationships of bat 229E-related CoVs and HCoV-229E, the full 

238 genomes were determined directly from fecal specimens from four representative bat viruses 

239 (labeled with circles in Figures 1A and C). Figure 2A shows that bat 229E-related CoV 

240 genomes comprise 28,014-28,748 nt, which exceeds the length of known HCoV-229E strains 

241 by 844-1,479 nt. As shown in Figure 2B, HCoV-229E and all bat viruses were closely related 

242 within the putative ORFlab. This allowed the delineation of non-structural proteins (nsp) 1- 

243 16 for all bat viruses in analogy to HCoV-229E. Table 2 provides details on length and 

244 cleavage sites of the predicted nsp. Sequence identity in seven concatenated nsp is used by the 

245 International Committee for the Taxonomy of Viruses (ICTV) for CoV species designation 

246 (1). As shown in Table 3, the four fully sequenced bat viruses showed translated aa sequence 

247 identities of 93.3-97.1% with HCoV-229E. This was well above the 90% threshold 

248 established by the ICTV, indicating all bat 229E-related CoVs and HCoV-229E form a single 

249 species. Bat virus Kwamang8, which formed a phylogenetically basal sister-clade to the other 

250 bat viruses and HCoV-229E, could not be sequenced on a full genome level. The aa sequence 

251 of the partial RdRp gene of Kwamang8 differed by only 3.3% from other bat viruses and 

252 HCoV-229E. Based upon previous comparisons of CoV RdRp sequences for tentative species 

253 delineation (2, 18), Kwamang8 forms part of the same species as the other bat viruses and 
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254 HCoV-229E. This CoV species would also include the recently described alpaca 229E-related 

255 CoV (32), which showed 96.9-97.2% aa sequence identity with ElCoV-229E and 94.2-97.8% 

256 with the bat viruses in the seven concatenated nsp domains. 

257 

258 As shown in Figure 2A, all seven open reading frames (ORFs) known from HCoV-229E 

259 were found in bat 229E-related CoVs in the sequence ORFla/lb-Spike-ORF4-Envelope- 

260 Membrane-Nucleocapsid. Amino acid identities between predicted ORFs of the bat viruses 

261 and FlCoV-229E ranged from the 67.2-91.6% described above for the translated Spike genes 

262 to 88.3-94.6% {ORFlab), with bat virus lineage 1 consistently showing highest aa sequence 

263 identities. Table 4 provides details for all sequence comparisons. 

264 We looked for additional support for the existence of these predicted ORFs by analyzing the 

265 sequence context at their 5’-termini. This is because in CoVs, ORFs are typically preceded by 

266 highly conserved transcription regulatory sequence (TRS) elements (42). All putative ORFs 

267 from bat-229E related CoVs showed high conservation of the typical FlCoV-229E TRS core 

268 sequence UCU C/A AACU and adjacent bases. Table 5 provides details on all putative TRS 

269 elements within bat 229E-related CoV genomes. 

270 

271 Figure 3A shows Bayesian phylogenetic trees reconstructed for all individual ORFs. The 

272 alpaca 229E-related CoV clustered in intermediate position between HCoV-229E and the bat 

273 viruses in the ORF lab and Spike, but with bat viruses only in Membrane, Envelope, 

274 Nucleocapsid, and ORF4. The divergent topologies again suggested recombination events in 

275 229E-related CoVs. To find further evidence for recombination events and identify genomic 

276 breakpoints, 229E-related CoVs were analyzed by bootscanning. As shown in Figure 3B, 

277 bootscanning supported multiple recombination events involving HCoV-229E, bat 229E- 

278 related CoVs and the alpaca 229E-related CoV. Major recombination breakpoints occurred 

279 within the ORFlab and the beginning of the Spike gene, compatible with previous analyses of 
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280 CoV recombination patterns (2) and the divergent topologies between the RdRp and Spike 

281 genes noted above. Bootscanning also suggested a potential genomic breakpoint within the 

282 Spike gene, mapping to the borders of the SI (associated with for receptor binding) and S2 

283 domains (associated with membrane fusion). This would be consistent with previous evidence 

284 supporting intra-Spike recombination events in bat-associated CoVs (43). To obtain further 

285 support for potential intra -Spike recombination events, separate phylogenetic reconstructions 

286 for the SI and the S2 domains were made. As shown in Figure 3B, these phylogenetic 

287 reconstructions supported recombination events involving the alpaca 229E-related CoV and 

288 HCoV-229E, but not the bat 22E-related CoVs. In the SI domain, the alpaca 229E-related 

289 CoV clustered with clinical HCoV-229E strains, while the HCoV-229E reference strain inf-1 

290 isolated in 1962 clustered in phylogenetically basal sister relationship. Only in the S2 domain, 

291 the intermediate position of the alpaca compared to bat and human 229E-related CoVs noted 

292 before in comparisons of the full Spike was maintained. These data may hint at recombination 

293 events between HCoV-229E and the alpaca virus and further supported genetic compatibility 

294 between these two viruses belonging to one CoV species. 

295 

296 Three major differences existed between HCoV-229E, the alpaca 229E-related CoV and the 

297 bat 229E-related CoVs. The first of these differences occurred in the putative ORF4. Similar 

298 to HCoV-229E strains characterized from clinical specimens, a contiguous ORF4 existed in 

299 all bat viruses that was 156-164 aa residues longer than the alpaca 229E-related CoV ORF4. 

300 Re-analysis of the putative ORF4 sequence of the alpaca 229E-related CoV showed that this 

301 apparently shorter ORF4 was due to an insertion of a single cytosine residue at position 181. 

302 Without this putative insertion, the alpaca 229E-related CoV ORF4 showed the same length 

303 as homologous ORFs in bat 229E-related CoVs and HCoV-229E. Since the HCoV-229E 

304 ORF4 is known to accumulate mutations in cell culture (40), the apparently truncated ORF in 

305 the alpaca 229E-related CoV isolate may thus not occur in vivo. The extended ORF4 of the 

13 


D 

o 


cfWology JVI Journal of Virology Accepted Mdfl USCH pf Posted Oflllhe 


306 alpaca 229E-related CoV would be most closely related to bat viruses from clade 1 with 5.5% 

307 aa sequence distance, compared to at least 8.8% distance from HCoV-229E strains. 

308 

309 The second difference was a considerably longer SI portion of the bat 229E-related CoV 

310 Spike genes compared to ElCoV-229E. Figure 4 shows that the three bat lineages contained 

311 185-404 additional aa residues upstream of the putative receptor binding domain (44, 45) 

312 compared to ElCoV-229E. Bat lineage 1 which was phylogenetically most closely related to 

313 ElCoV-229E carried the smallest number of additional aa residues. Of note, the alpaca 229E- 

314 related CoV was identical to HCoV-229E in the number of aa residues within this region of 

315 the Spike gene. 

316 

317 The third major difference was the existence of an additional putative ORF downstream of the 

318 Nucleocapsid gene in all bat viruses. Non-homologous ORFs of unknown function 

319 downstream the Nucleocapsid occur in several alpha- and betacoronaviruses, including Feline 

320 infectious peritonitis virus (FIPV), Transmissible gastroenteritis virus of swine (TGEV), 

321 Rhinolophus bat CoV HKU2, Scotophilus bat CoV 512, Miniopterus bat CoV HKU8 (23), the 

322 Chaerephon bat CoVs BtKY22/BtKY41, the Cardioderma bat CoV BtKY43 (46) and bat 

323 CoV F1KU10 from Chinese Hipposideros and Rousettus species (47). In the genus 

324 Betacoronavirus, only Bat CoV HKU9 from Rousettus and the genetically related Eidolon bat 

325 CoV BtKY24 (46) carry additional ORFs at this genomic position. No ORF in the 3’-terminal 

326 genome region is known from HCoV-229E. The alpaca 229E-related CoV contains an ORF at 

327 this position termed ORFX by Crossley et al. (32). In analogy to consecutive numbers used to 

328 identify HCoV-229E ORFs, we refer to this ORF as ORF8 hereafter. The putative TRS 

329 context preceding ORF8 was conserved in all bat 229E-related CoV and in the alpaca 229E- 

330 related CoV, suggesting that a corresponding subgenomic mRNA8 may exist. The 3’-UTR of 

331 bat 229E-related CoVs immediately followed the putative ORFS. This was supported by the 
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332 existence of a conserved octanucleotide sequence and highly conserved stem elements 

333 forming part of the pseudo-knot typically located at the 5’-end of alphacoronavirus 3’-UTRs 

334 (48). As shown in Figure 5, HCoV-229E shows a high degree of sequence conservation 

335 compared to bat 229E-related CoVs and the alpaca 229E-related CoV in this genomic region, 

336 including a highly conserved putative TRS. Bioinformatic analyses (49-51) provided evidence 

337 for the presence of two transmembrane domains in the predicted proteins 8 of the alpaca and 

338 the genetically related bat 229E-related viruses. This may imply a role of the predicted protein 

339 8 in coronaviral interactions with cellular or viral membranes. 

340 As shown in Figure 5, one of the bat 229E-related CoV lineages represented by virus KW2E- 

341 F56 contained a highly divergent ORF8. In protein BLAST comparisons, the KW2E-F56 

342 ORF8 showed limited similarity to the putative ORF7b of F1KU10 and to the putative ORF8 

343 located upstream the Nucleocapsid of a Nigerian Hipposideros betacoronavirus termed Zaria 

344 bat CoV (47, 52). This may hint at cross-genus recombination events between different 

345 hipposiderid bat CoVs in the past. However, overall aa sequence identity between these bat 

346 CoV ORFs was very low with maximally 28.2%. As shown in Figure 6, only the central part 

347 of these ORFs contained a stretch of 46 more conserved aa residues showing up to 39.1% 

348 sequence identity and 47.8% similarity (Blosum62 matrix). The origin and function of the 

349 divergent ORFS thus remain to be determined. 

350 
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Discussion 


We characterize highly diverse bat CoVs on a full genome level and show that these viruses 
form one species together with HCoV-229E and a recently described virus from alpacas (32). 
We analyze the genomic differences between human, bat and alpaca 229E-related CoVs to 
elucidate potential host transitions during the formation of HCoV-229E. 

A major difference between bat 229E-related CoVs and HCoV-229E was the Spike deletion 
in HCoV-229E compared to the bat viruses. Interestingly, the bat 229E-related CoV lineage 1 
which was phylogenetically most related to HCoV-229E also carried the smallest number of 
additional aa residues. Most chiropteran CoVs are restricted to the gastrointestinal tract, 
whereas HCoVs mainly replicate in the respiratory tract (2). The Spike deletion in HCoV- 
229E compared to ancestral bat viruses is thus noteworthy, since deletions in this protein have 
been associated with changes in coronaviral tissue tropism. This is best illustrated by TGEV, 
whose full-length Spike variants are associated with a dual tropism for respiratory and enteric 
tract, whereas the deleted variant termed porcine respiratory CoV (PRCV) mainly replicates 
in the respiratory tract (53). One could hypothesize that adaptation of bat 229E-related CoV 
lineage 1 to both non-chiropteran hosts and to respiratory transmission may have been easier 
compared to the other bat 229E-related CoV lineages. 

Because the exact aa residues of the HCoV-229E RBD conveying cell entry are not known, it 
is difficult to predict whether the bat viruses may interact with the HCoV-229E cellular 
receptor Aminopeptidase N (45) or its Hipposideros homologue. Characterization of this bat 
molecule and identification of permissive cell culture systems may allow initial susceptibility 
experiments for chimeric viruses. Of note, although the alpaca 229E-related CoV was 
successfully isolated (32), no data on receptor usage and cellular tropism are available so far 
(2, 53). 
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377 Another major difference was the existence of an ORF8 downstream the Nucleocapsid gene 

378 in bat 229E-related viruses and the detection of putative sequence remnants of this ORF in 

379 HCoV-229E. Hypothetically, deterioration of ORF8 in HCoV-229E could have occurred due 

380 to loss of gene function in human hosts after zoonotic transmission from bats or intermediate 

381 hosts. This may parallel gradual deletions in the SARS-CoV accessory ORF8 during the 

382 human epidemic compared to bat SARS-related CoVs (54) and is consistent with 

383 characterizations of HCoV-229E clinical strains showing high variability of this genomic 

384 region (55). 

385 

386 The virus-host association between 229E-related CoVs and the bat genus Hipposideros is 

387 strengthened by our virus detections in Flipposideros species in Ghana and in Gabon (41), 

388 which is separated from Ghana by about 1,800 km. The observed link between 229E-related 

389 alphacoronaviruses and hipposiderid bats is paralleled by the detections of genetically closely 

390 related betacoronaviruses in different Flipposideros species from Ghana, Nigeria, Thailand 

391 and Gabon (33, 41, 52, 56), suggesting restriction of these CoVs to hipposiderid bat genera. 

392 Due to their proofreading capacity, CoVs show evolutionary rates of 10E-5 to 10E-6 

393 substitutions per site per replication cycle, which is much slower than rates observed for other 

394 RNA viruses (57, 58). Our data thus suggest a long evolutionary history of 229E-related 

395 CoVs in Old World hipposiderid bats that greatly exceeds that of HCoV-229E in humans, 

396 confirming previous hypotheses from our group (33). 

397 

398 The putative role of the alpaca 229E-related CoV in the formation of HCoV-229E is unclear. 

399 Our data enable new insights into the evolutionary history of HCoV-229E. First, the alpaca 

400 229E-related CoV contained an intact ORF8 which was genetically related to the homologous 

401 gene in bat 229E-related CoVs. Second, genes of the alpaca CoV clustered either with bat 

402 viruses only or in intermediate position between bat viruses and HCoV-229E. Because the 
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403 alpaca 229E-related CoV showed the same deletion in its Spike gene as HCoV-229E 

404 compared to bat 229E-related CoVs, it may be possible that alpacas represent a first host 

405 switch from bats followed by a second inter-host transfer from alpacas to humans. The 

406 relatedness of the alpaca 229E-related CoV to older ElCoV-229E strains rather than to 

407 contemporary ones reported by Crossley et al. would be compatible with this scenario (32). 

408 Elowever, the alpaca 229E-related CoV was reported only from captive animals in the U.S. 

409 and whether this virus is indeed endemic in New World alpacas is unclear. Additionally, the 

410 apparent intra -Spike recombination event may speak against a role of the alpaca virus as the 

411 direct ancestor of HCoV-229E. Further analyses will be required to confirm this putative 

412 recombination event, ideally including additional sequence information from old ElCoV-229E 

413 strains. Furthermore, a hypothetical direct transfer of Old World bat viruses to New World 

414 alpacas appears geographically unfeasible. It would be highly relevant to investigate Old 

415 World came lids for 229E-related CoVs that may have been passed on to captive alpacas and 

416 that may represent direct ancestors of HCoV-229E. 

417 Additional constraints to consider in the hypothetical role of camelids for the evolutionary 

418 history of 229E-related CoVs is the time and place of putative host switches from bats. 

419 Camels were likely introduced to Africa not earlier than 5,000 years ago from the Arabian 

420 Peninsula (59, 60) and could not possibly come into direct contact with West African H. cf. 

421 ruber or H. abae of the Guinean savanna. The majority of CoV species seems to be confined 

422 to host genera (2). Therefore, it may be possible that 229E-related CoV transmission was 

423 mediated through closely related species like H. tephrus, which occurs in the Sahel zone and 

424 comes into contact to populations of H. cf. ruber distantly related to those from the Guinean 

425 savanna (61). This bat species should be analyzed for 229E-related CoVs together with other 

426 genera of the family Hipposideridae, like Asellia or Triaenops, which are desert-adapted bats 

427 sharing their habitat with camelids both in Arabia and Africa and may harbor genetically 

428 related CoVs. An important parallel to this evolutionary scenario is the role of camelids for 
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429 the emerging MERS-CoV (30, 62), whose likely ancestors also occur in bats (20, 21). 

430 However, we cannot rule out that the alpaca 229E-related CoV and HCoV-229E represent 

431 two independent zoonotic acquisitions from 229E-related CoVs existing in hipposiderid bats 

432 and potentially yet unknown intermediate hosts. 

433 

434 The existence of different serotypes in the expanded 229E-related CoV species is unclear. 

435 CoV neutralization is mainly determined by antibodies against the S protein, and particularly 

436 the SI domain (63). The phylogenetic relatedness of the SI domains from the alpaca 229E- 

437 related CoV and HCoV-229E suggests that these viruses form one serotype. The most closely 

438 related bat 229E-related CoV lineage showed 8.4% aa sequence distance in the translated 

439 Spike gene from HCoV-229E. This was comparable to the 7.8-18.6% aa distance between 

440 FIPV, TGEV und canine CoV, which belong to one CoV species ( Alphacoronavirus 1) and 

441 for which cross-neutralization was observed (64). The about 30% Spike aa sequence distance 

442 between the other bat 229E-related lineages and HCoV-229E were comparable to the distance 

443 between HCoV-NL63 and HCoV-229E, which form two different serotypes (65). HCoV- 

444 229E thus likely forms one serotype that includes the alpaca 229E- and potentially the most 

445 closely related bat 229E-related lineage, while the other bat 229E-related lineages may form 

446 different serotypes. In our study, lack of bat sera and absence of bat 229E-related CoV 

447 isolates prevented serological investigations. The generation of pseudotyped viruses carrying 

448 bat 229E-related Spike motifs may allow future serological studies. Of note, our joint analyses 

449 of Ghanaian patients with respiratory disease in this study and previous work from our group 

450 investigating Ghanaian villagers (66) showed that Ghanaians were infected with the globally 

451 circulating HCoV-229E, whereas no evidence of bat 229E-related CoV infecting humans was 

452 found. If serotypes existed in 229E-related CoVs, serologic studies may thus aid to elucidate 

453 putative exposure of humans and potential camelid intermediate hosts to these bat viruses. 
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454 It should be noted that throughout Africa, bats are consumed as wild game (67) and humans 

455 frequently live in close proximity of bat caves (68), including usage of bat guano as fertilizer 

456 and drinking water from these caves (21). These settings potentially facilitate the exposure of 

457 humans and their peri-domestic animals, including came lids, to these previously remote bat 

458 viruses. 

459 In summary, HCoV-229E may be a paradigmatic example of the successful introduction of a 

460 bat CoV into the human population, possibly with camelids as intermediate hosts. 

461 
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725 Figure legends 

726 Figure 1. Phylogenetic relationships of the genus Alphacoronavirus, HCoV-229E strains 

727 and the novel bat viruses 

728 A, Bayesian phylogeny of an 816 nucleotide RdRp gene sequence fragment corresponding to 

729 positions 13,891-14,705 in HCoV-229E prototype strain inf-1 (GenBank accession no. 

730 NC002645) using a GTR+G+I substitution model. SARS-coronavirus (CoV) was used as an 

731 outgroup. Viruses with additional sequence information generated in this study were marked 

732 with circles (full genome) or marked with triangles ( Spike gene). Bat viruses detected in our 

733 previous studies from Ghana (33) and Gabon are given in cyan (41). B, Neighbour-joining 

734 phylogeny of the same RdRp gene fragment with a nucleotide percentage distance substitution 

735 model and the complete deletion option. The tree was rooted against HCoV-NL63. Viruses 

736 were coloured according to their origin. C. Bayesian phylogeny of the full Spike gene of bat 

737 229E-related CoVs, the alpaca 229E-related CoV and HCoV-229E strains identified with 

738 GenBank accession numbers and year of isolation, using a WAG amino acid substitution 

739 model and HCoV-NL63 as an outgroup. The novel bat 229E-related CoVs are shown in 

740 boldface and red. Branches leading to the outgroup were truncated for graphical reasons as 

741 indicated by slashed lines. Values at nodes show support of grouping from posterior 

742 probabilities or 1,000 bootstrap replicates (only values above 0.7 were shown). 

743 

744 Figure 2. Genome organization of 229E-related coronaviruses and relationships between 

745 viruses from bats and humans 

746 A, 229E-related CoV genomes represented by black lines; ORFs are indicated by grey arrows. 

747 Locations of transcription-regulatory core sequences (TRS) are marked by black dots. HCoV- 

748 NL63 is shown for comparison. B, Similarity plots generated using SSE VI. 1 (38) using a 

749 sliding window of 400 and a step size of 40 nucleotides (nt). The HCoV-229E prototype 

750 strain inf-1 was used with animal viruses identified in the legend. 
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751 

752 Figure 3. Bayesian phylogenies of major open reading frames and recombination 

753 analysis of HCoV-229E and related animal viruses 

754 A, Phylogenies were calculated with a WAG amino acid substitution model. The novel bat 

755 viruses are shown in red. The alpaca CoV is shown in cyan. Filled circles, posterior 

756 probability support exceeding 0.95, scale bar corresponds to genetic distance. Details on the 

757 origin of HCoV-229E strain VFC408 which was generated for this study can be retrieved 

758 from (69). Branches leading the outgroup FiCoV-NL63 were truncated for graphical reasons. 

759 B, Bootscan analysis using the Jukes-Cantor algorithm with a sliding window of 1,500 and a 

760 step size of 300 nt. The FiCoV-220E inf-1 strain was used with animal 229E-related viruses as 

761 identified in the legend. C. Phylogenies of the SI and S2 subunit were calculated according to 

762 A. One representative HCoV-229E strain was selected per decade according to (70); GenBank 

763 accession nos. DQ243974, DQ243964, DQ243984, DQ243967. 

764 

765 Figure 4. Amino acid sequence alignment of the 5’-end of the Spike gene of HCoV-229E 

766 and related animal viruses 

767 Amino acid alignment of the first part of the Spike gene of 229E-related CoVs including four 

768 bat 229E-related CoVs, the alpaca 229E-related CoV and the HCoV-229E inf-1 strain. 

769 Conserved amino acid residues are marked in black, sequence gaps are represented by 

770 hyphens. 

771 

772 Figure 5. Nucleotide sequence alignment of the genomic 3’-end of HCoV-229E and 

773 related animal viruses 

774 Nucleotide alignment of the genome region downstream the Nucleocapsid gene including 

775 four bat 229E-related CoV, the alpaca 229E-related CoV and representative HCoV-229E full 

776 genomes identified with GenBank accession number or strain name. Dots represent identical 
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111 nucleotides, hyphens represent sequence gaps. Grey bars above alignments indicate open 

778 reading frames and the beginning of the poly-A tail. The putative start and stop codon of 

779 ORF8 is labelled lime green, the corresponding putative TRS element is marked blue. The 

780 conserved genomic sequence elements and the highly conserved stem elements forming part 

781 of the pseudo-knot (PK) were marked with grey and purple background. 

782 

783 Figure 6. Amino acid sequence alignment of the putative ORF8 from a bat 229E-related 

784 coronavirus and closest hits from two other hipposiderid bat coronaviruses 

785 Conserved amino acid residues between sequence pairs are highlighted in color according to 

786 amino acid properties, sequence gaps are represented by hyphens. The central domain 

787 showing higher sequence similarity between compared viruses is boxed for clarity. The 229E- 

788 related alphacoronavirus KW2E-F56 from a Hipposideros cf. ruber detected in this study is 

789 given in red, the alphacoronavirus HKU10 originated from a Chinese FI. pomona, the 

790 betacoronavirus Zaria originated from a Nigerian H. gigas. 

791 
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Table 1. Overview of bats tested for 229E-related coronaviruses in Ghana 


Species 

n 

Positives (%) 

Coleura afra 

68 

0 

Hipposideros abae 

242 

19(7.8) 

H. cf. gigas 

12 

0 

H. jonesi 

5 

0 

H. cf. ruber 

1611 

62 (3.8) 

Nycteris cf. gambiensis 

91 

0 

Rhinolophus alcyone 

4 

0 

R. landeri 

9 

0 

Taphozous perforatus 

21 

0 

Lissonycteris angolensis 

20 

0 

Rousettus aegyptiacus 

4 

0 

Total 

2,087 

81 (3.9) 
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Table 2. Coding capacity for the putative non-structural proteins of the novel bat 229E-related 
coronaviruses 



KW2E-F151 

F01A-F2 


AT1A-F1 


KW2E-F56 


1 st to last amino acid 

Protein size 

1 st to last amino acid 

Protein size 

l sl to last amino acid 

Protein size 

1 sl to last amino acid 

Protein size 

NSPl 

Met'-Gly 111 

111 

Met'-Gly 111 

111 

Met'-Gly" 1 

111 

Met'-Gly 109 

109 

NSP2 

Asn'^-Gly 897 

786 

Asn" 2 -Gly K97 

786 

Asn" 2 -Gly 897 

786 

Asn"°-Gly 895 

786 

NSP3 

Gly 89S -Ala 2494 

1597 

Gly 89S -Ala 2494 

1597 

Gly S98 -Ala 2492 

1595 

Gly 896 -Ala 2489 

1594 

NSP4 

Gly 2495 -Gln 2975 

481 

Gly 249! -Gln 2975 

481 

Gly 2493 -GIn 2973 

481 

Gly 2490 -Gln 2970 

481 

NSP5 

Ala 2976 -Gln 3277 

302 

Ala 2976 -Gln 3277 

302 

Ala 2974 -Gln 3275 

302 

Ala 297 '-Gln 3272 

302 

NSP6 

Ser 3278 -Gln 3556 

279 

Ser 3278 -Gln 3556 

279 

Ser^-Gln 3553 

278 

Ser 3273 -Gln 3551 

279 

NSP7 

Ser 3557 -Gln 3639 

83 

Ser 3557 -Gln 3639 

83 

Ser 3554 -Gln 3636 

83 

Ser 3552 -Gln 3634 

83 

NSP8 

Ser 3640 -Gln 3834 

195 

Ser 3640 -Gln 3834 

195 

Ser 3637 -Gln 3831 

195 

Ser 3635 -Gln 3829 

195 

NSP9 

Asn 3835 -Gln 3943 

109 

Asn 3835 -Gln 3943 

109 

Asn 3832 -Gln 3940 

109 

Asn 3830 -Gln 393S 

109 

NSP10 

Ala 3944 -Gln 4078 

135 

Ala 3944 -Gln 4078 

135 

Ala 394 ‘-Gin 4075 

135 

Ala 3939 -Gln 4073 

135 

NSPl 1 

Ser^^-Glu 4097 

19 

Ser 4079 -Glu 4097 

19 

Ser 4076 -Glu 4094 

19 

Ser 4074 -Glu 4092 

19 

NSP12 

Ser 4079 -Gln 5005 

927 

Ser 4079 -Gln 5005 

927 

Ser 4076 -Gln 5002 

927 

Ser 4074 -Gln 5000 

927 

NSP13 

Ala 5006 -Gln 5602 

597 

Ala 5006 -Gln 5602 

597 

Ala 5003 -Gln 5599 

597 

Ala 500 '-Gln 5597 

597 

NSP14 

Ser^-Gln 6120 

518 

Ser 5603 -Gln 6120 

518 

Ser^-Gln 6 " 7 

518 

Ser^-Gln 6 " 5 

518 

NSP15 

Gly" 2 '-Gln ,4,! 

348 

Gly sl2l -Gln t4<! 

348 

Gl/'^-Gln 6465 

348 

Gl/'^-Gln 0462 

348 

NSP16 

Ser 6469 -Lys 6768 

300 

Ser 6469 -Lys 6768 

300 

Ser 6466 -Lys 6766 

301 

Ser^-Lys 6763 

300 
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Table 3. Comparison of amino acid identities of seven conserved replicase domains of the bat 229E-related 
coronaviruses, HCoV-229E and the alpaca 229E-related coronavirus for species delineation 

Percentage amino acid sequence identity 


Human Coronavirus 229E a vs. 


Domains 

within 

Bat 

229E" 

KW2E-F56 

AT1A-F1 

KW2-F151 

F01A-F2 

ACoV c vs 

Bat 229E b 

ADRP 

75.6-100 

75-75.6 

91.1-92.9 

84.5-85.1 

84.5-85.1 

76.8-90.5 

NSP5(3CLpro) 

90.7-100 

90.4-90.7 

97.4-97.7 

96.4-96.7 

97.4-97.7 

90.4-97.4 

NSP12 (RdRp) 

97.5-100 

95.7-96 

97.3-97.6 

96.9-97.3 

97.2-97.7 

97.3-98.9 

NSP13 (NTPase/Hel) 

97.2-100 

96.5-97.2 

97.2-97.8 

97.3-98 

98-98.7 

97.8-99.3 

NSP14 (ExoN/N7-MTase) 

96.1-100 

95-95.6 

97.5-98.1 

97.3-97.9 

96.9-97.5 

96.3-99.2 

NSP15 (NendoU) 

92.8-100 

92.2 

96.3-96.6 

96.6-96.8 

96.8-97.1 

91.4-96.8 

NSP16 (O-MT) 

91.7-100 

90.7-91 

91.7-92 

97.3-97 

97.3-97.7 

90.7 - 98.0 

Concatenated domains 

94.5-100 

93.3-93.6 

96.4-96.8 

96.4-96.7 

96.7-97.1 

94.2-97.8 


“including - HCoV 229E - Inf-1, HCoV 229E - 0349, HCoV 229E - J0304; 
b including - Bat CoV KW2E-F56, AT1A-F1, KW2E-F151 and F01A-F2; 
c ACoV - Alpaca Coronavirus 

GenBank accession numbers of reference sequences: HCoV-229E - Inf-1: NC 002645.1; HCoV-229E - 0349: JX503060; HCoV-299E - J0304: JX503061; 
Alpaca CoV (ACoV): JQ410000 
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Table 4. Amino acid identity between open reading frames of human, bat and camelid 
229-related coronaviruses 


Percentage Amino Acid Sequence Identity 



Human Coronavirus 229E" vs. 


within Bat 
CoV b 

ACoV c vs Bat 
CoV b 


KW2E-F151 

F01A-F2 

AT1A-F1 

KW2E-F56 

AcoV 

ORFla 

89.5 - 89.9 

89.5 - 89.8 

92.6-93.1 

84.1 -84.6 

92.9 - 93.3 

83.8 - 97.9 

85.1 -93.5 

ORFlab 

92.5 - 92.9 

92.6 - 93 

94.2 - 94.6 

88.3 - 88.8 

94.6 - 95 

88.7 - 98.3 

89.3 - 95.2 

Spike 

87.5-91.6 

87.4-91.4 

67.2 - 68.9 

67.2-69.1 

92.8 - 94.4 

66.8 - 92.4 

69.7 - 90.8 

ORF4 

92.4-93.1 

92.6 - 93.2 

77.3 - 78.8 

71.2-73.6 

79.7-78.1 

75.7 - 96.4 

67.2 - 82.8 

Envelope 

89.6 - 90.9 

89.6 - 90.9 

77.6 - 78.9 

78.7 - 80 

89.6 - 90.9 

77.3 - 98.7 

77.3 -100 

Membrane 

90.2 - 90.7 

89.3 - 89.9 

86.2 - 86.7 

87.1 -87.6 

89.8 - 90.2 

86.7 - 98.7 

86.3-99.1 

Nucleocapsid 

90.7 - 92 

90.2-91.5 

88.6 - 90.4 

75.8 - 76.6 

88.4 - 89.7 

78.7 - 99.5 

78.2 - 94 

ORFX/8 



- 



12.5- 100 

15.2 - 83.9 


"including - HCoV 229E - Inf-1, HCoV 229E - 0349, HCoV 229E - J0304 
b including - Bat CoV KW2E-F56, AT1A-F1, KW2E-F151, F01A-F2; 
c ACoV - Alpaca Coronavirus 
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Table 5. Putative transcription regulatory sequences of the novel bat 229E-related coronaviruses and 
HCoV-229E 



HCoV-229E/ inf-1 

KW2E-F151 

F01A-F2 

AT1A-F1 

KW2E-F56 

Leader 

(62) UCUCAACUAAACN 220 

(62) UCUCAACUAAACN 220 

(62) UCUCAACUAAACN 220 

(62) UCUCAACUAAACN 220 

(62) UCUCAACUAAACN 220 

(293) AUG 

(293) AUG 

(293) AUG 

(293) AUG 

(293) AUG 

Spike 

(20571) UCUCAACUAAAUAA (20585) UCUCAACUAAAUAA (20585) UCUCAACUAAAUAA (20576) UCUCAACUAAAAA 

(20570) UCUCAACUAAGUA 

A (20586) AUG 

A (20600) AUG 

A (20600) AUG 

(20589) AUG 

(20583) AUG 

ORF4 

(24054) UCAACUAAANjg 

(24644) UCAACUAAACN 38 

(24638) UCAACUAAACN 38 

(25290) UCAACUAAACN 38 

(25258) UCAACUAAACN 38 

(24101) AUG 

(24691) AUG 

(24685) AUG 

(25337) AUG 

(25304) AUG 

Envelope 

(24599) UCUCAACUAAN,52 

(25190) UCUCAACUAACN l49 

(25184) UCUCAACUAACN, 49 

(25836) UCUCAACUAACN I49 

(25805) UCAACUAACN 131 

(24762) AUG 

(25349) AUG 

(25343) AUG 

(25992) AUG 

(25962) AUG 

Membrane 

(24991) UCUAAACUAAACG 

(25578) UCUAAACUAAACGA (25572) UCUAAACUAAACGA (26224) UCUAAACUAAACG 

(26185) UCUAAACUAAACG 

ACA (25007) AUG 

CA (25594) AUG 

CA (25588) AUG 

(26237) AUG 

(26198) AUG 

Nucleocapsid 

(25680) UCUAAACUGAACGA (26270) UCUAAACUGAACGA (26264) UCUAAACUGAACGA (26934) UCUAAACUGAACGA 

(26874) UCUAAACUGAACGA 

AAAG (25698) AUG 

AAAG (26288) AUG 

AAAG (26282) AUG 

AAACC (26953) AUG 

AAACC (26893) AUG 

ORF8 


(27468) UCAACUAAAC 

(27462) UCAACUAAAC 

(28130) UCAACUAAAC 

(28124) UCAACUAAAC 


(27478) AUG 

(27472) AUG 

(28141) AUG 

(28134) AUG 


First bracket: Genome position of the first residue of the putative TRS sequence, second bracket: genome position of the first 


base of the start codon; Ni ower case : number of base residues between end of the putative TRS sequence and start codon (where 
applicable) 












